<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<!--Converted with LaTeX2HTML 96.1-h (September 30, 1996) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds -->
<HTML>
<HEAD>
<TITLE>Introduction</TITLE>
<META NAME="description" CONTENT="Introduction">
<META NAME="keywords" CONTENT="Surrogates">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<LINK REL=STYLESHEET HREF="Surrogates.css">
</HEAD>
<BODY bgcolor=#ffffff LANG="EN" >
 <A NAME="tex2html75" HREF="node2.html"><IMG WIDTH=37 HEIGHT=24 ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A> <A NAME="tex2html73" HREF="Surrogates.html"><IMG WIDTH=26 HEIGHT=24 ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A> <A NAME="tex2html67" HREF="Surrogates.html"><IMG WIDTH=63 HEIGHT=24 ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A>   <BR>
<B> Next:</B> <A NAME="tex2html76" HREF="node2.html">Detecting weak nonlinearity</A>
<B>Up:</B> <A NAME="tex2html74" HREF="Surrogates.html">Surrogate time series</A>
<B> Previous:</B> <A NAME="tex2html68" HREF="Surrogates.html">Surrogate time series</A>
<BR> <P>
<H1><A NAME="SECTION00010000000000000000">Introduction</A></H1>
<P>
A nonlinear approach to analysing time series
data&nbsp;[<A HREF="node36.html#SFI">1</A>, <A HREF="node36.html#coping">2</A>, <A HREF="node36.html#abarbook">3</A>, <A HREF="node36.html#ourbook">4</A>, <A HREF="node36.html#habil">5</A>] can be motivated by two distinct
reasons.  One is intrinsic to the signal itself while the other is due to
additional knowledge we may have about the nature of the observed
phenomenon. As for the first motivation, it might be that the arsenal of linear
methods has been exploited thoroughly but all the efforts left certain
structures in the time series unaccounted for. As for the second, a system may
be known to include nonlinear components and therefore a linear description
seems unsatisfactory in the first place. Such an argument is often heard for
example in brain research -- nobody expects for example the brain to be a
linear device. In fact, there is ample evidence for nonlinearity in particular
in small assemblies of neurons.  Nevertheless, the latter reasoning is rather
dangerous. The fact that a system contains nonlinear components does not prove
that this nonlinearity is also reflected in a specific signal we measure from
that system. In particular, we do not know if it is of any practical use to go
beyond the linear approximation when analysing the signal. After all, we do not
want our data analysis to reflect our prejudice about the underlying system but
to represent a fair account of the structures that are present in the
data. Consequently, the application of nonlinear time series methods has to be
justified by establishing nonlinearity in the time series.
<P>
Suppose we had measured the signal shown in Fig.&nbsp;<A HREF="node1.html#figarspikes">1</A> in some
biological setting. Visual inspection immediately reveals nontrivial structure
in the serial correlations. The data fails a test for Gaussianity, thus ruling
out a Gaussian linear stochastic process as its source.  Depending on the
assumptions we are willing to make on the underlying process, we might suggest
different origins for the observed strong ``spikyness'' of the dynamics.
Superficially, low dimensional chaos seems unlikely due to the strong
fluctuations, but maybe high dimensional dynamics? A large collection of
neurons could intermittently synchronise to give rise to the burst episodes. In
fact, certain artificial neural network models show qualitatively similar
dynamics.  The least interesting explanation, however, would be that all the
spikyness comes from a distortion by the measurement procedure and all the
serial correlations are due to linear stochastic dynamics. Occam's razor tells
us that we should be able to rule out such a simple explanation before we
venture to construct more complicated models.
<P>
Surrogate data testing attempts to find the least interesting explanation
that cannot be ruled out based on the data. In the above example, the data
shown in Fig.&nbsp;<A HREF="node1.html#figarspikes">1</A>, this would be the hypothesis that the data
has been generated by a stationary Gaussian linear stochastic process
(equivalently, an <EM>autoregressive moving average</EM> or ARMA process) that is
observed through an invertible, static, but possible nonlinear observation
function:
<BR><IMG WIDTH=500 HEIGHT=16 ALIGN=BOTTOM ALT="equation1015" SRC="img1.gif"><BR> 
Neither the order <I>M</I>,<I>N</I>, the ARMA coefficients, nor the function <IMG WIDTH=22 HEIGHT=24 ALIGN=MIDDLE ALT="tex2html_wrap_inline1908" SRC="img2.gif"> are
assumed to be known. Without explicitly modeling these parameters, we still
know that such a process would show characteristic linear correlations
(reflecting the ARMA structure) and a characteristic single time probability
distribution (reflecting the action of <IMG WIDTH=22 HEIGHT=24 ALIGN=MIDDLE ALT="tex2html_wrap_inline1908" SRC="img2.gif"> on the original Gaussian
distribution). Figure&nbsp;<A HREF="node1.html#figarspikes_surr">2</A> shows a surrogate time series
that is designed to have exactly these properties in common with the data but
to be as random as possible otherwise. By a proper statistical test we can now
look for additional structure that is present in the data but not in the
surrogates.
<P>
<blockquote><A NAME="919">&#160;</A><IMG WIDTH=360 HEIGHT=154 ALIGN=BOTTOM ALT="figure1017" SRC="img3.gif"><BR>
<STRONG>Figure:</STRONG> <A NAME="figarspikes">&#160;</A>
   A time series showing characteristic bursts.<BR>
</blockquote>
<P>
<blockquote><A NAME="921">&#160;</A><IMG WIDTH=360 HEIGHT=154 ALIGN=BOTTOM ALT="figure1018" SRC="img4.gif"><BR>
<STRONG>Figure:</STRONG> <A NAME="figarspikes_surr">&#160;</A>
   A surrogate time series that has the same single time probability
   distribution and the same autocorrelation function as the sequence in
   Fig.&nbsp;<A HREF="node1.html#figarspikes">1</A>. The bursts are fully explained by these two
   properties.<BR>
</blockquote>
<P>
<P>
In the case of the time series in Fig.&nbsp;<A HREF="node1.html#figarspikes">1</A>, there is no
additional structure since it has been generated by the rule
<BR><IMG WIDTH=500 HEIGHT=19 ALIGN=BOTTOM ALT="equation1019" SRC="img5.gif"><BR>
where <IMG WIDTH=30 HEIGHT=24 ALIGN=MIDDLE ALT="tex2html_wrap_inline1912" SRC="img6.gif"> are Gaussian independent increments and <IMG WIDTH=10 HEIGHT=7 ALIGN=BOTTOM ALT="tex2html_wrap_inline1914" SRC="img7.gif"> is chosen
so that the data have unit variance.<A NAME="tex2html3" HREF="footnode.html#41"><IMG  ALIGN=BOTTOM ALT="gif" SRC="foot_motif.gif"></A>
This means that the strong nonlinearity that generates the bursts is due to the
distorted measurement that enhances ordinary fluctuations, generated by linear
stochastic dynamics.
<P>
In order to systematically exclude simple explanations for time series
observations, this paper will discuss formal statistical tests for
nonlinearity. We will formulate suitable null hypotheses for the underlying
process or for the observed structures themselves. In the former case, null
hypotheses will be extensions of the statement that the data were generated by
a Gaussian linear stochastic processes. The latter situation may occur when it
is difficult to properly define a class of possibly underlying processes but we
want to check if a particular set of observables gives a complete account of
the statistics of the data. We will attempt to reject a null hypothesis by
comparing the value of a nonlinear parameter taken on by the data with its
probability distribution.  Since only exceptional cases allow for the exact or
asymptotic derivation of this distribution unless strong additional assumptions
are made, we have to estimate it by a Monte Carlo resampling technique. This
procedure is known in the nonlinear time series literature as the method of
<EM>surrogate data</EM>, see Refs.&nbsp;[<A HREF="node36.html#theiler1">6</A>, <A HREF="node36.html#theiler-sfi">7</A>, <A HREF="node36.html#fields">8</A>]. Most of the
body of this paper will be concerned with the problem of generating an
appropriate Monte Carlo sample for a given null hypothesis.
<P>
We will also dwell on the proper interpretation of the outcome of such a test.
Formally speaking, this is totally straightforward: A rejection at a given
significance level means that if the null hypothesis is true, there is certain
small probability to still see the structure we detected.  Non-rejection means
even less: either the null hypothesis is true, or the discriminating statistics
we are using fails to have power against the alternative realised in the
data. However, one is often tempted to go beyond this simple reasoning and
speculate either on the nature of the nonlinearity or non-stationarity that
lead to the rejection, or on the reason for the failure to reject.
<P>
Since the actual quantification of nonlinearity turns out to be the easiest --
or in any case the least dangerous -- part of the problem, we will discuss it
first.  In principle, any nonlinear parameter can be employed for this purpose.
They may however differ dramatically in their ability to detect different kinds
of structures. Unfortunately, selecting the most suitable parameter has to be
done without making use of the data since that would render the test incorrect:
If the measure of nonlinearity has been optimised formally or informally with
respect to the data, a fair comparison with surrogates is no longer
possible. Only information that is shared by data and surrogates, that is, for
example, linear correlations, may be considered for guidance.  If multiple data
sets are available, one could use some sequences for the selection of the
nonlinearity parameter and others for the actual test.  Otherwise, it is
advantageous to use one of the parameter free methods that can be set up with
very little detailed knowledge of the data.
<P>
Since we want to advocate to routinely use a nonlinearity test whenever
nonlinear methods are planned to be applied, we feel that it is important to
make a practical implementation of such a test easily accessible. Therefore,
one branch of the TISEAN free software package&nbsp;[<A HREF="node36.html#tisean">9</A>] is devoted to
surrogate data testing. Appendix&nbsp;<A HREF="node32.html#appA">A</A> will discuss the implementational
aspects necessary to understand what the programs in the package do.
<P>
<HR><A NAME="tex2html75" HREF="node2.html"><IMG WIDTH=37 HEIGHT=24 ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A> <A NAME="tex2html73" HREF="Surrogates.html"><IMG WIDTH=26 HEIGHT=24 ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A> <A NAME="tex2html67" HREF="Surrogates.html"><IMG WIDTH=63 HEIGHT=24 ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A>   <BR>
<B> Next:</B> <A NAME="tex2html76" HREF="node2.html">Detecting weak nonlinearity</A>
<B>Up:</B> <A NAME="tex2html74" HREF="Surrogates.html">Surrogate time series</A>
<B> Previous:</B> <A NAME="tex2html68" HREF="Surrogates.html">Surrogate time series</A>
<P><ADDRESS>
<I>Thomas Schreiber <BR>
Mon Aug 30 17:31:48 CEST 1999</I>
</ADDRESS>
</BODY>
</HTML>
